1 research outputs found
Dynamic Orchestration of Massively Data Parallel Execution.
Graphics processing units (GPUs) are specialized hardware accelerators
capable of rendering graphics much faster than conventional
general-purpose processors. They are widely used in personal computers,
tablets, mobile phones, and game consoles. Modern GPUs are not only
efficient at manipulating computer graphics, but also are more effective
than CPUs for algorithms where processing of large data blocks can be done
in parallel. This is mainly due to their highly parallel architecture.
While GPUs provide low-cost and efficient
platforms for accelerating massively parallel applications, tedious
performance tuning is required to maximize application execution
efficiency. Achieving high performance requires the programmers to
manually manage the amount of on-chip memory used per thread, the total
number of threads per multiprocessor, the pattern of off-chip memory
accesses, etc.
In addition to a complex programming model, there is a lack of performance
portability across various systems with different runtime properties. Programmers usually make assumptions about
runtime properties when they write code and optimize that code based
on those assumptions. However, if any of these properties changes
during execution, the optimized code performs poorly. To alleviate these
limitations, several implementations of the application are needed to
maximize performance for different runtime properties. However, it
is not practical for the programmer to write several different versions of the
same code which are optimized for each individual runtime condition.
In this thesis, we propose a static and dynamic compiler framework to
take the burden of fine tuning different implementations of the same code
off the programmer. This framework enables the programmer to write the
program once and allow a static compiler to generate different versions of
a data parallel application with several tuning parameters. The runtime
system selects the best version and fine tunes its parameters based on
runtime properties such as device configuration, input size, dependency,
and data values.PhDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/108805/1/mehrzads_1.pd